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1.  Condition  number:  Goldstine-von  Neumann  problem  and 
Edelman-Spielman-Teng  conjecture 

Definition  1.1.  For  an  n  x  n  matrix  A,  the  condition  number  n{A)  is  defined  as 

k(M)  :=  ||A||  ||A-1|| 

where  ||||  denotes  the  operator  norm.  (If  A  is  not  invertible,  set  k(A)  =  oo.) 
Another,  and  more  convenient,  way  to  write  k(M )  is 


k(M)  =  a1(M)/an(M) 

where  or  >  •  •  •  >  an(M)  are  the  singular  values  of  M.  We  will  use  this  definition  in  the  rest 
of  the  discussion. 

The  condition  number  is  one  of  the  most  important  parameters  in  numerical  linear  algebra, 
as  it  plays  a  crucial  role  in  running  time  and  accuracy  of  most  linear  algebraic  algorithms 
(see,  e.g.,  [1]).  As  well  known,  linear  algebraic  algorithms  are  of  fundamental  importance  in 
the  study  of  complex  networks,  or  large  data  sets  in  general. 

In  practice,  the  smaller  the  condition  number  is,  the  better.  Matrices  with  small  condition 
number  (polynomial  in  n,  say)  are  called  well-conditioned  while  ones  with  large  condition 
number  (super  polynomial  in  n)  are  called  ill-conditioned. 

The  problem  of  estimating  the  condition  number  of  a  random  matrix  was  first  posed  by 
Goldstine  and  von  Neumann  [15,  16],  who  tried  to  analyze  the  performance  of  one  of  the  very 
first  computers  on  large  matrices  in  the  1940s. 

Let  us  consider  the  condition  number  of  a  random  matrix  Mn.  For  convenience,  we  assume 
that  the  entries  of  Mn  are  iid  random  variables  with  mean  0  and  variance  1.  This  assumption 
can  be  weakened  considerably,  but  notice  that  it  already  covers  the  gaussian  model  and  also 
the  most  important  discrete  model,  the  Bernoulli  (or  Rademacher)  model  (where  the  entries 
independently  take  values  ±1  with  probability  half). 
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It  is  well-known  that  if  we  assume  furthermore  that  the  entries  have  light  tail  (for  a  more 
specific  statement,  see  [24]),  then  the  largest  singular  value  <ti,  is  strongly  concentrated 
around  2 ffin.  Since 


k(M)  =  a\{M)/ an(M), 

the  problem  reduces  to  the  study  of  the  least  singular  value  an(M). 

When  the  entries  of  M  are  standard  gaussian,  the  distribution  of  crn  was  computed  by 
Edelman  [13],  confirming  the  prediction  of  Goldstine  and  von  Neumann. 

Theorem  1.2  (Limiting  distributions  for  gaussian  models).  For  any  fixed  t  >  0,  we  have 

P(nun(Mn(flR))2  <  t)  =  [  1  g-Or/2+v^)  dx  +  o(  1)  (1) 

Jo  2VX 

Edelman’s  proof  used  the  integration  method  and  relied  heavily  on  concrete  formula  of  the 
joint  distribution  of  the  eigenvalues  of  random  gaussian  matrix  and  thus  can  not  be  extended 
to  other  cases.  On  the  other  hand,  both  Edelman  and  Spielman  and  Teng  [27]  made  an 
explicit  conjecture  that  the  distribution  of  an  is  the  same  in  the  Bernoulli  case. 

To  illustrate  the  difficulty  of  the  problem,  let  us  mention  that  it  is  already  quite  non-trivial  to 
show  that  the  least  singular  value  of  a  random  Bernoulli  matrix  is,  with  probability  tending 
to  one  as  n  tends  to  infinity,  positive.  This  was  done  by  Kornlos  in  1967  [19,  20],  using 
combinatorial  arguments. 

The  Pi’s  first  major  finding  is  the  confirmation  of  Edelman-Spielman-Teng  conjecture.  In 
fact,  he  and  T.  Tao  discovered  that  the  limiting  distribution  of  the  least  singular  value  (and 
as  a  consequence,  that  of  the  condition  number)  of  a  random  matrix  is  universal,  i.e.,  does 
not  depend  on  the  distribution  of  the  entries. 

Theorem  1.3  (Universality  for  the  least  singular  value).  [30]  Let  £  be  a  random  variable  with 
mean  0  and  variance  1  and  suppose  E|£|c°  <  oo  for  some  sufficiently  large  absolute  constant 
Co-  Let  Mn  be  a  random  matrix  with  entries  being  iid  copies  of  f.  Then  for  all  t  >  0,  we 
have  ^ 

P (nan(Mn)2  <t)=  [  - dx  +  0(n~c )  (2) 

Jo  2VX 

The  implied  constants  in  the  O(-)  notation  depend  on  Ej^l^0  but  are  uniform  in  t. 

The  next  figure  shows  an  empirical  demonstration  of  Theorem  1.3  for  Bernoulli  and  for 
gaussian  distributions. 
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Figure  1.  Plotted  above  is  the  curve  P(y/n<rn(Mn(£))  <  x),  based  on  data 
from  1000  randomly  generated  matrices  with  n  =  100.  The  dotted  curve 
was  generated  with  £  a  random  Bernoulli  variable,  taking  the  values  +1  and 
—  1  each  with  probability  1/2;  and  the  dashed  curve  was  generated  with  £  a 
gaussian  normal  random  variable.  Note  that  the  two  curves  are  already  close 
together  in  spite  of  the  relatively  coarse  data. 


Theorem  1.3  also  holds  in  the  complex  case.  This  this  case  the  distribution  of  the  least 
singular  value  of  a  (complex)  gaussian  matrix  is  also  available  in  Edelman’s  paper  [13]. 


2.  Condition  number  of  randomly  perturbed  matrices 

Now  we  discuss  the  general  problem  of  determining  the  condition  number  of  A  +  Mn ,  where 
A  can  be  an  arbitrary  (deterministic)  matrix.  This  problem  is  of  importance  in  the  study  of 
large  data  sets  with  random  perturbation.  In  particular,  it  played  a  crucial  role  in  the  theory 
of  smoothed  analysis  introduced  by  Spielman  and  Teng  [27] . 

In  order  to  bound  the  condition  number  k(A  +  Mn)  from  above,  we  want  to  bound  both 
||.A  +  Mn||  and  ||(A  +  Mn)_1||.  Bounding  ||A  +  Mn\\  is  easy.  In  fact,  we  expect  that  in  most 
cases  the  noise  does  not  change  ||^4||  significantly.  By  the  triangle  inequality 
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On  the  other  hand,  in  most  models  of  random  matrices,  one  can  show  that  ||Mn||  =  0(y/n ) 
with  very  high  probability,  so  if  ||A||  =  n°^\  then  ||A  +  Mn  ||,  with  high  probability,  is 
polynomially  bounded.  In  practice,  high-dimensional  matrices  usually  have  not  too  large 
entries  (compared  to  their  sizes),  so  this  assumption  about  A  is  almost  always  satisfied,  and 
we  are  going  to  assume  it  through  our  discussion. 

The  problem  is  thus  to  bound  the  norm  of  the  inverse  ||(.A-|-Mn)-1||.  Spielman,  Sankar  and 
Teng  [27,  26]  proved  the  following  result  for  the  case  when  Mn  is  gaussian  (see  also  [3]  for  an 
extension) 

Theorem  2.1.  Let  A  be  an  arbitrary  n  by  n  matrix  and  Mn  be  a  random  matrix  with  iid 
gaussian  real  entries.  Then  for  any  x  >  0, 

P(||(A  +  Mn)~l  ||  >  x\/n )  <  1.823x. 


One  of  the  main  goal  of  the  proposal  was  to  find  an  analogue  of  this  result  for  general 
random  matrices.  The  problem  was  posed  by  Spielman  several  years  ago  [27]. 

The  case  A  =  0  was  addressed  by  Rudelson  and  Vershyin  [24],  but  their  proof  does  not 
extend  to  general  A.  It  is  not  a  coincidence,  as  it  has  turned  out  that  there  is  a  sharp 
distinction  between  the  two  cases.  While  in  the  base  case  when  A  =  0,  the  above  theorem 
and  [24]  showed  that  the  behavior  of  an(Mn )  does  not  depend  on  the  distribution  of  the 
entries  of  Mn  (as  far  as  they  have  mean  0  and  variance  1  and  bounded  moments).  On  the 
other  hand,  when  A  is  not  zero,  Tao  and  the  PI  found  an  example  of  A  such  that  the  behavior 
of  ||  (v4  +  Mn)_1||  when  Mn  is  random  Bernoulli  does  not  satisfy  the  bound  in  Theorem  2.1. 
This  means  that  one  cannot  expect  Theorem  2.1  holds  for  non-gaussian  variables  and  more 
importantly  that  A  should  play  a  role  in  the  inequality. 

With  this  observation  in  mind,  Tao  and  the  PI  were  able  to  prove  the  following  theorem  [37] 

Theorem  2.2.  Let  x  be  a  random  variable  with  mean  zero  and  bounded  second  moment, 
and  let  71/2,(70  be  constants.  Then  there  is  a  constant  c  depending  on  x,  7,  C  such  that  the 
following  holds.  Let  Mn  be  the  random  matrix  of  size  n  whose  entries  are  iid  copies  of  x,  A 
be  a  deterministic  matrix  satisfying  ||A||  <  n7  and  let  Mn  :=  M  +  Nn.  Then 

P(s„(A  +  Mn)  <  n-2(c'+1b)  <  c(n-c+o(i)  +  P(||jvn||  >  n7). 

This  theorem  sharpens  in  extends  several  existing  results,  such  as  those  of  Tao  and  the  PI 
[28,  29])  and  Rudelson  and  Vershynin  [24], 


3.  Computing  eigenvectors  with  noise 

An  important  problem  that  appears  in  various  areas  of  applied  mathematics  (in  particular 
statistics,  computer  science  and  numerical  analysis)  is  to  compute  the  first  few  singular 
vectors  of  a  large  matrix.  Among  others,  this  problem  lies  at  the  heart  of  PCA  (Principal 
Component  Analysis),  which  has  a  very  wide  range  of  applications  (for  many  examples,  see 
[18,  21]  and  the  references  therein). 
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The  basic  setting  of  the  problem  is  as  follows: 

Problem  3.1.  For  a  matrix  A  of  size  n  x  n  with  singular  values  or  >  •  •  •  >  crn  >  0,  let 
vi, ...  ,vn  be  the  corresponding  (unit)  singular  vectors.  Compute  vi, ...  ,Vk,  for  some  k  <  n. 


Typically  n  is  large  and  k  is  relatively  small.  As  a  matter  of  fact,  in  many  applications  k 
is  a  constant  independent  of  n.  For  example,  to  obtain  a  visualization  of  a  large  set  of  data, 
one  often  sets  fc  =  2  or  3.  The  assumption  that  A  is  a  square  matrix  is  for  convenience  and 
our  analysis  can  be  carried  out  with  nominal  modification  for  rectangular  matrices. 

The  matrix  A,  which  represents  some  sort  of  data,  is  often  perturbed  by  random  noise. 
Thus,  one  typically  works  with  A  +  Mn ,  where  Mn,  as  usual,  denotes  a  random  matrix.  A 
natural  and  important  problem  is  to  estimate  the  influence  of  noise  on  the  vectors  v\, . . . ,  v^. 
We  denote  by  v[, ...,  v'k  the  first  k  singular  vectors  of  A  +  Mn. 

For  sake  of  presentation,  we  restrict  ourselves  to  the  case  k  =  1  (the  first  singular  vector). 
The  following  question  is  of  importance 

Question  3.2.  When  is  v[  a  good  approximation  of  v\  ? 


A  traditional  way  to  measure  the  distance  between  two  vectors  v  and  v'  is  to  look  at 
sinZ(u,t/),  where  Z(v,v')  is  the  angle  between  the  vectors,  taken  in  [0,  vr/2]  (see  [14]).  (One 
can  also  of  course  use  ||u  —  r/||  but  we  choose  to  follow  the  literature.)  To  make  the  problem 
more  quantitative,  let  us  fix  a  small  parameter  e  >  0,  which  represents  a  desired  accuracy. 
Our  question  now  is  to  find  a  sufficient  condition  for  the  matrix  A  which  guarantees  that 
sin  Z(vi,  v[)  <  e.  It  has  turned  out  that  the  key  parameter  to  look  at  is  the  gap  (or  separation) 


6  :=  a i  -  02, 

between  the  first  and  second  singular  values  of  A.  Classical  results  in  numerical  linear  algebra 
yield. 

Theorem  3.3.  (Wedin  theorem)  For  any  given  £  >  0,  if  6  >  ||if||/e;  then 


sinZ(ui,  v[)  <  e. 


In  the  case  when  A  and  A  +  Mn  are  Hermitian,  this  statement  is  a  special  case  of  the  famous 
Davis-Kahan  sin0  theorem.  Wedin  [40]  extended  Davis-Kahan  theorem  to  non- Hermitian 
matrices,  resulting  in  a  general  theorem  that  contains  Theorem  3.3  as  a  special  case  (see  [14, 
Chapter  8]  for  more  discussion  and  history). 

For  most  standard  models  of  random  matrices,  ||Mn||  ~  2 y/n.  So  we  have 
Corollary  3.4.  If  5  >  2y/n/s,  then  with  probability  1  —  o(l) 


sinZ(ui,  v[)  <  e. 
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Low  dimensional  data  and  improved  bounds. 

In  a  large  variety  of  problems,  the  data  is  of  small  dimension,  namely,  r  :=  rankH  <C  n  (see 
[12]  and  the  references  therein). 

The  PI  and  his  students  run  experiments  for  extremal  cases  with  r  =  0(1)  and  see  a  vast 
improvement  over  the  bound  predicted  in  corollary  3.4. 


Empirical  CDF 


400  X  400  matrix  of  rank  2,  with  gap  5  equals  8;  this  gap  is  much  less  than  predicted  by 
Wedin’s  bound.  Indeed,  we  see  that  with  probability  close  to  1,  one  can  have  e  ~  12°  = 

By  Corollary  3.4,  in  order  to  obtain  this  accuracy,  we  need  5  to  be  as  large  as  2\/400  x  15  = 
600. 


Empirical  CDF 


1000  x  1000  matrix  of  rank  2,  with  gap  equals  10. 

The  PI  discovered  that  the  bound  on  5  depends  on  the  real  dimension  r  of  the  data,  rather 
than  the  size  n.  He  was  able  to  prove  [39] 

Theorem  3.5.  For  any  positive  constant  e  there  is  a  positive  constant  C  =  C(e)  such  that 
the  following  holds.  Assume  that  A  has  rank  r  <  n'99  and  n  5;  ai  and  8  >  Cy/r  logn. 
Then  with  probability  1  —  o(l) 
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sin  Z(vi,  v[)  <  e.  (3) 

Given  a  desired  accuracy  e  (say,  1/100),  this  theorem  means  that  when  r  is  small,  v[ 
approximates  v\  under  much  weaker  assumption  on  the  data  matrix  A,  compared  to  Corollary 
3.4. 


4.  Spectrum  of  random  graphs 

A  popular  way  to  model  large  complex  networks  is  to  use  random  graphs.  Naturally,  the 
problem  of  understanding  the  spectrum  of  these  graphs  is  of  importance. 

The  most  studied  models  of  random  graphs  are  Erdos-Renyi  graph  and  random  regular 
graph  (for  definition  and  more  information,  see  [2]).  While  these  graphs  are  not  yet  a  good 
approximation  for  real-life  complex  networks, it  has  turned  out  that  methods  developed  to 
study  them  are  very  useful  in  the  investigations  of  more  practical  models. 

Let  An  be  the  adjacency  matrix  of  the  Erdos  -Renyi  random  graph  G(n,p);  An  is  a  random 
symmetric  n  x  n  matrix  whose  upper  triangular  entries  are  independent  identical  distributed 
(iid)  copies  of  a  real  random  variable  £  and  diagonal  entries  are  0;  £  is  a  Bernoulli  random 
variable  that  takes  values  1  with  probability  p  and  0  with  probability  1  —  p. 

=  p,  Yar £  =  p(l  —  p)  =  a2. 

In  1950s,  Wigner  [41]  discovered  the  famous  semi-circle  law  for  the  limiting  distribution  of 
the  eigenvalues  of  random  matrices  (see  next  section).  His  proof  extends,  without  difficulty, 
to  the  adjacency  matrix  of  G(n,p),  given  that  up  — »  oo  with  n. 

Theorem  4.1.  For  p  =  the  empirical  spectral  distribution  (ESD)  of  the  matrix  ~^^An 

converges  in  distribution  to  the  semicircle  distribution  which  has  a  density  psc(x )  with  support 
on  [-2,2], 

Psc(x)  :=  — \/4  -  x2. 
zir 

If  np  =  0(1),  the  semicircle  law  no  longer  holds.  In  this  case,  the  graph  almost  surely  has 
@(n)  isolated  vertices,  so  in  the  limiting  distribution,  the  point  0  has  positive  constant  mass. 

Let  us  now  discuss  random  d-regular  graph  (denoted  by  Gntd ).  The  spectrum  of  Gn)d,  was 
considered  by  McKay  [22]  about  30  years  ago.  He  proved  that  if  d  is  fixed,  and  n  — >  oo,  then 
the  limiting  density  function  is 


fd(x)  =  < 


ifW<2UFU; 

0  otherwise. 


This  is  usually  referred  to  as  the  McKay  or  Kesten-McKay  law. 
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Probability  Density  Functions  of  Random  Regular  Graphs 


It  is  easy  to  verify  that  as  d  — >  oo,  if  we  normalize  the  variable  x  by  sj d  —  1 .  then  the  above 
density  converges  to  the  semicircle  distribution  on  [—2,2],  In  fact,  a  numerical  simulation 
shows  the  convergence  is  quite  fast  (see  figure  above) 

It  has  been  conjectured  that  Theorem  4.1  holds  for  Gn)d  with  d  — >  oo.  Let  A'n  be  the 
adjacency  matrix  of  Gn)d,  and  set 


Conjecture  4.2.  If  d  oo  then  the  ESD  of  M'n  converges  to  the  standard  semicircle 
distribution. 

Nothing  has  been  proved  about  this  conjecture,  until  recently.  In  [11],  Dumitriu  and  Pal 
showed  that  the  conjecture  holds  for  d  tending  to  infinity  slowly,  d  =  n°^ .  Their  method 
does  not  extend  to  larger  d. 

With  students  L.  Tran  and  L.  Wu,  the  PI  were  able  to  established  Conjecture  4.2  in  full 
generality.  Our  method  was  very  different  from  that  of  [11]  and  could  be  useful  in  the  study 
of  more  sophisticated  model  of  random  graphs,  since  it  nicely  handles  a  major  difficulty  that 
the  the  edges  in  the  graph  are  not  totally  independent  (unlike  the  Erdos- Renyi  case). 

Theorem  4.3.  If  d  tends  to  infinity  with  n,  then  the  empirical  spectral  distribution  of  ^^M'n 
converges  in  distribution  to  the  semicircle  distribution. 

5.  Local  statistics  of  eigenvalues 

One  of  the  main  goal  of  random  matrix  theory  is  to  establish  limiting  distributions  con¬ 
cerning  the  local  eigenvalue  statistics.  To  simplify  the  presentation,  we  are  going  to  focus  on 
Wigner  Hermitian  matrices,  which  are  perhaps  the  most  prominent  model  in  the  field. 

Definition  5.1  (Wigner  matrices).  Let  n  be  a  large  number.  A  Wigner  Hermitian  matrix 
(of  size  n )  is  defined  as  a  random  Hermitian  n  X  n  matrix  Mn  with  upper  triangular  complex 
entries  Qj  :=  fij  +  \f—\ Tij  (1  <  i  <  j  <  n)  and  diagonal  real  entries  fa  (1  <  i  <  n)  where 
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•  For  1  <  i  <  j  <  n,  fij,  Tjj  are  iid  copies  of  a  real  random  variable  £  with  mean  zero 
and  variance  1/2. 

•  For  1  <  i  <  n,  fa  are  iid  copies  of  a  real  random  variable  £  with  mean  zero  and 
variance  1. 

•  £,£  have  exponential  decay,  i.e.,  there  are  constants  C.  C  such  that  P(|£|  >tc)< 
exp(— t),  P(|£|  >  tc)  <  exp(— t),  for  all  t  >  C' . 

We  refer  to  £,£  as  the  atom  distributions  of  Mn,  and  fij,Tij  as  the  atom  variables.  We  refer 
to  the  matrix  Wn  :=  Mn  as  the  coarse-scale  normalized  Wigner  Hermitian  matrix,  and 
An  :=  sfnMn  as  the  fine-scale  normalized  Wigner  Hermitian  matrix. 


Example.  (GUE)  An  important  special  case  of  a  Wigner  Hermitian  matrix  is  the  gaussian 
unitary  ensemble  (GUE),  in  which  £,£  are  gaussian  random  variables  with  mean  zero  and 
variance  1/2,  1  respectively.  The  coarse-scale  normalization  Wn  is  convenient  for  placing  all 
the  eigenvalues  in  a  bounded  interval,  while  the  fine-scale  normalization  An  is  convenient  for 
keeping  the  spacing  between  adjacent  eigenvalues  to  be  roughly  of  unit  size. 

Given  annxn  Hermitian  matrix  A,  we  denote  its  n  eigenvalues  as 

Ai(A)  <  . . .  <  \n{A), 
and  write  A(A)  :=  (Ai(A), . . . ,  Xn(A)). 

The  study  of  the  eigenvalues  A i(Wn)  of  (normalized)  Wigner  Hermitian  matrices  has  been 
one  of  the  major  topics  of  study  in  random  matrix  theory.  The  properties  of  these  eigenvalues 
are  not  only  interesting  in  their  own  right,  but  also  have  been  playing  essential  roles  in  many 
other  areas  of  mathematics,  such  as  mathematical  physics,  probability,  combinatorics,  and 
the  theory  of  computing. 


A  cornerstone  of  this  theory  is  the  Wigner  semicircular  law.  Denote  by  psc  the  semi-circle 
density  function  with  support  on  [—2,2], 


Psc(x) 


\/4  x2,  |x|  <  2 

0,  |x|  >  2. 


(4) 


Theorem  5.2  (Semi-circular  law).  Let  Mn  be  a  Wigner  Hermitian  matrix.  Then  for  any 
real  number  x, 

lim  -|{1  <  i  <  n  :  A i(Wn)  <  x}\  =  [  psc{y )  dy 
n— XX)  n  J —2 

in  the  sense  of  probability  (and  also  in  the  almost  sure  sense,  if  the  Mn  are  all  minors  of 
the  same  infinite  Wigner  Hermitian  matrix),  where  we  use  |/|  to  denote  the  cardinality  of  a 
finite  set  I. 


Remark  5.3.  Wigner[41]  proved  this  theorem  for  special  ensembles.  The  general  version  above 
is  due  to  Pastur  [25].  The  semi-circular  law  in  fact  holds  under  substantially  more  general 
hypotheses  than  those  given  in  Definition  5.1,  but  we  will  not  discuss  this  matter  further 
here.  One  consequence  of  Theorem  5.2  is  that  we  expect  most  of  the  eigenvalues  of  Wn  to  lie 
in  the  interval  (—2  +  e,  2  +  e)  for  e  >  0  small;  we  shall  thus  informally  refer  to  this  region  as 
the  bulk  of  the  spectrum. 
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Theorem  5.2  addressed  the  global  behavior  of  the  eigenvalues.  The  local  properties  are  much 
harder  and  their  studies  require  much  more  sophisticated  tools.  Most  of  the  precise  theorems 
have  been  obtained  for  the  matrices  with  gaussian  entries  (GUE). 

Notice  that  in  Theorem  5.2,  we  only  need  to  know  the  first  two  moments  of  the  entries  (the 
mean  and  the  variance)  to  determine  the  global  distribution.  Rather  surprisingly,  Tao  and 
the  PI  [33]  discovered  that  the  local  distributions  depend  only  on  the  first  four  moments. 
This  led  to  what  we  call  the  Four  Moment  Theorem. 

Definition  5.4  (Matching  moments).  Let  k  >  1.  Two  complex  random  variables  £,£'  are 
said  to  match  to  order  k  if  one  has  E3fi(£)a$h(£)b  =  EK(^/)a9(^/)6  whenever  a,  b  >  0  are 
integers  such  that  a  +  b  <  k. 


In  the  model  case  when  the  real  and  imaginary  parts  of  £  or  of  are  independent,  the 
matching  moment  condition  simplifies  to  the  assertion  that  E$ft(£)a  =  E51i(£/)a  and  E7t(£)6  = 
E$5(£')b  for  all  0  <  a,  b  <  k. 

Theorem  5.5  (Four  Moment  Theorem  for  eigenvalues).  Let  Co  >  0  be  a  sufficiently  small 
constant.  Let  Mn  =  (£ij)i<ij<n  and  M'n  =  (£^-)i<i,j<n  he  two  Wigner  matrices.  Assume 
furthermore  that  for  any  1  <  i  <  j  <  n,  ftj  and  £L  match  to  order  4  and  for  any  1  <  i  <  n, 
fa  and  match  to  order  2.  Set  An  :=  y/nMn  and  A!n  :=  \JnM'n ,  let  1  <  k  <  nc°  be  an 
integer,  and  let  G  :  Rfc  -*  R  be  a  smooth  function  obeying  the  derivative  bounds 

|VJG(x)|  <  nc°  (5) 

for  all  0  <  j  <  5  and  x  €  RA.  Then  for  any  1  <  i\  <  *2  ■  ■  ■  <  ik  <  n,  and  for  n  sufficiently 
large  we  have 

|E(G(Ail(Aln), . . . ,  A,,fc(40))  -  E(G(Aq«), . . . ,  A4K)))|  <  n"c°.  (6) 

One  can  use  The  Four  Moment  Theorem,  combined  with  various  recent  results  (in  particular 
those  of  Erdos  et.  al.;  see  [7]),  to  attack  several  long  standing  problems.  For  more  discussions, 
we  refer  to  [7,  17,  33,  35,  36]). 
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